Instance/Logo Search |
Return |
Introduction
The purpose of instance search is to search a specific object in large-scale image/video datasets. In object retrieval, a bounding box or a shape is often offered to delimit the query entity, such as a person, place, or other object. The bag-of-words (BoW) framework is the long-lasting standard approach for large-scale image and visual object retrieval. Many schemes derived from this approach exhibit state-of-the-art performance on several benchmarks. Our works are based on this framework as well, including visual dictionary learning, embedding and aggregating of local features, and high-dimensional indexing, etc.
Framework
- Visual dictionary learning
- Feature embedding and aggregating
- High-dimensional indexing
The framework of this project consists of three parts, i.e., visual dictionary learning, feature embedding and aggregating, high-dimensional indexing.
Visual dictionary learning, as a crucial task of image representation, has gained increasing attention lately. To learn a more efficient visual dictionary which contains more information, we leverage textual tags in visual dictionary learning, and propose a novel sparse coding model to capture their common sparse latent semantic structures.
As traditional BoW based feature embedding and aggregating methods just involve little information in the final image representation, we try to embed more information such as angle and distance to deliver a new image representation with abundant hints which characterizes individual images better.
Since the representations of images are high-dimensional vectors, an efficient indexing system is necessary. We introduce various indexing structures in different scenarios, such as tree-based indexing structure, hashing methods and product quantization.